Docs: Improve SFT/RL user experience #2794

hengtaoguo · 2025-12-06T05:12:32Z

Description

Reduce user friction in SFT/RL and fix broken links.

b/463394566
b/463409639
b/463409807
b/463396352
b/463393644

Tests

N/A

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

docs/tutorials/posttraining/rl.md

docs/install_maxtext.md

SurbhiJainUSC · 2025-12-11T17:32:22Z

docs/tutorials/posttraining/rl.md


 ## Create virtual environment and Install MaxText dependencies
-If you have already completed the [MaxText installation](https://github.com/AI-Hypercomputer/maxtext/blob/main/docs/guides/install_maxtext.md), you can skip to the next section for post-training dependencies installations. Otherwise, please install `MaxText` using the following commands before proceeding.
+If you have already completed the [MaxText installation](../../install_maxtext.md), you can skip to the next section for post-training dependencies installations. Otherwise, please install `MaxText` using the following commands before proceeding.


Why do we need to change the link here?

SurbhiJainUSC · 2025-12-11T17:33:50Z

docs/tutorials/posttraining/rl.md


 export RUN_NAME=<name for this run> # e.g., $(date +%Y-%m-%d-%H-%M-%S)
-export MAXTEXT_CKPT_PATH=${BASE_OUTPUT_DIRECTORY}/${RUN_NAME}/0/items
+export MAXTEXT_CKPT_PATH=${BASE_OUTPUT_DIRECTORY}/${RUN_NAME}/0/items  # Actual checkpoint saved with an extra /0/items path suffix


This doesn't look right if user has the checkpoint in a GCS. We can remove this env variable from here and move this to next section, similar to https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft.html#get-your-model-checkpoint.

SurbhiJainUSC · 2025-12-11T17:34:36Z

docs/tutorials/posttraining/rl.md

 The overview of what this run will do is as follows:

-1. We load a policy model and a reference model. Both are copies of `Llama3.1-8b-Instruct`.
+1. We load a policy model and a reference model. Both are copies of the model checkpoint you specified (e.g., `Llama3.1-8b-Instruct`).


Can you do the same at line 128?

SurbhiJainUSC · 2025-12-11T17:38:43Z

docs/tutorials/posttraining/sft_on_multi_host.md


 ## 2. Install XPK
-Install XPK by following the instructions in the [official documentation](https://github.com/AI-Hypercomputer/xpk?tab=readme-ov-file#installation-via-pip).
+Install XPK by following the instructions in the [official documentation](https://github.com/AI-Hypercomputer/xpk?tab=readme-ov-file#installation-via-pip). We also provide a quick guide for XPK installation and usage [here](https://maxtext.readthedocs.io/en/latest/run_maxtext/run_maxtext_via_xpk.html).


nit: We also provide a quick guide for XPK installation here
That XPK documentation mainly talks about pre-training. Pointing users to XPK documentation at this point might create some confusion. Can we explicitly say just follow the instruction in that guide for XPK installation & Prerequisite and continue on the current doc for post-training?

SurbhiJainUSC · 2025-12-11T17:38:56Z

docs/tutorials/posttraining/rl_on_multi_host.md

 ## Submit your RL workload via Pathways

-Please create a pathways ready GKE cluster as described [here](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/create-gke-cluster), and you can submit the `train_rl.py` script via [XPK](https://github.com/AI-Hypercomputer/xpk).
+Please create a pathways ready GKE cluster as described [here](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/create-gke-cluster), and you can submit the `train_rl.py` script via [XPK](https://github.com/AI-Hypercomputer/xpk). We also provide a quick guide for XPK installation and usage [here](../../run_maxtext/run_maxtext_via_xpk.md).


Similar comment

hengtaoguo force-pushed the hengtaoguo-grpo branch 2 times, most recently from 8629e8b to 5ae647f Compare December 10, 2025 06:32

hengtaoguo changed the title ~~More UXR fixes~~ Docs: Improve SFT/RL user experience Dec 10, 2025

hengtaoguo marked this pull request as ready for review December 10, 2025 18:11

hengtaoguo requested review from A9isha, RissyRan, bvandermoon, gagika, gobbleturk, jacoguzo, jiangjy1982, richjames0, shralex and vipannalla as code owners December 10, 2025 18:11

SurbhiJainUSC reviewed Dec 10, 2025

View reviewed changes

docs/tutorials/posttraining/rl.md Show resolved Hide resolved

SurbhiJainUSC reviewed Dec 10, 2025

View reviewed changes

docs/tutorials/posttraining/rl.md Show resolved Hide resolved

SurbhiJainUSC reviewed Dec 10, 2025

View reviewed changes

docs/install_maxtext.md Outdated Show resolved Hide resolved

hengtaoguo force-pushed the hengtaoguo-grpo branch from c836743 to 66c8566 Compare December 10, 2025 20:54

NuojCheng approved these changes Dec 10, 2025

View reviewed changes

hengtaoguo force-pushed the hengtaoguo-grpo branch from d971a0b to 9b41868 Compare December 11, 2025 06:25

More UXR fixes

4f3a8cd

hengtaoguo force-pushed the hengtaoguo-grpo branch from b87dc05 to 4f3a8cd Compare December 11, 2025 06:28

SurbhiJainUSC reviewed Dec 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Docs: Improve SFT/RL user experience #2794

Docs: Improve SFT/RL user experience #2794

hengtaoguo commented Dec 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SurbhiJainUSC Dec 11, 2025

Uh oh!

SurbhiJainUSC Dec 11, 2025

Uh oh!

SurbhiJainUSC Dec 11, 2025

Uh oh!

SurbhiJainUSC Dec 11, 2025

Uh oh!

SurbhiJainUSC Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Docs: Improve SFT/RL user experience #2794

Are you sure you want to change the base?

Docs: Improve SFT/RL user experience #2794

Conversation

hengtaoguo commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SurbhiJainUSC Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

SurbhiJainUSC Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

SurbhiJainUSC Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

SurbhiJainUSC Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

SurbhiJainUSC Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hengtaoguo commented Dec 6, 2025 •

edited

Loading